library(tidyverse)
library(readxl)
library(scales)
library(ggmap)
library(maps)
library(usmap)
library(plotly)
EV_Chargers <- read_excel('data/EV Chargers_Last updated 07-30-2021.xlsx', sheet = 2)
EV_Sales <- read_excel('data/New ZEV Sales_Last updated 07-30-2021.xlsx', sheet = 2)
EV_Sales_Zip <- read_excel('data/New ZEV Sales_Last updated 07-30-2021.xlsx', sheet = 3)
Analyze the trends of electric car sales before 2021
#Filter out Hydrogen Cars. Not include 2021 since 2021 has not ended
EV_Sales <- EV_Sales %>%
filter(FUEL_TYPE != "Hydrogen") %>%
filter(`Data Year` != 2021)
#Total Car Sales Per Year.
EV_Sales_Year <- EV_Sales %>%
group_by(`Data Year`) %>%
summarize(total_sales = sum(`Number of Vehicles`)) %>%
arrange(desc(`Data Year`))
EV_Sales_Year %>% ggplot(aes(x = `Data Year`, y = total_sales)) +
geom_line(colour = "blue") +
labs(x = "Year",
y = "Total Sales of Cars",
title = "Total Sales of Cars Per Year")
Electric cars sales are growing exponentially in California since 2010. Between 2017-2018, there is a huge increase in electric car sales. We will look at why there is such big gap between 2017 and 2018.
#Filter by Brand
EV_Make <- EV_Sales %>%
group_by(MAKE) %>%
summarize(total = sum(`Number of Vehicles`)) %>%
arrange(desc(total))
#Only choose sales from top brands
EV_Make = head(EV_Make, 10)
ggplot(data = EV_Make, aes(x = reorder(MAKE, -total), y = total, fill = MAKE)) +
geom_col() +
labs(x = "Model",
y = "Total Sales of Car",
title = "Total Numbers of Electric Cars Sold") + scale_y_continuous(labels = comma) +
theme(axis.text.x = element_text(angle = 45), legend.position = "none")
Overall, Tesla outperform all other car companies in electric car sales.Tesla’s sales is almost triple than the second place, Chevrolet.
head(EV_Sales_Year, 5)
## # A tibble: 5 × 2
## `Data Year` total_sales
## <dbl> <dbl>
## 1 2020 144099
## 2 2019 145263
## 3 2018 154747
## 4 2017 91464
## 5 2016 71723
EV_Make_Year_Telsa_Model_3 <- EV_Sales %>%
group_by(MAKE, `Data Year`, MODEL) %>%
summarize(total = sum(`Number of Vehicles`)) %>%
arrange(desc(total)) %>%
filter(MAKE == 'TESLA' & `Data Year` %in% c(2017, 2018) & MODEL == "Model 3")
EV_Make_Year_Telsa_Model_3
## # A tibble: 2 × 4
## # Groups: MAKE, Data Year [2]
## MAKE `Data Year` MODEL total
## <chr> <dbl> <chr> <dbl>
## 1 TESLA 2018 Model 3 49710
## 2 TESLA 2017 Model 3 840
There is a big increase of car sales from 2017 to 2018. Model 3 caused the increase (almost 50K) from 2017 to 2018. Model 3 is a game changer for Tesla or even the entire market. Since 2018, Tesla sold the most electric car overall.
#Analyze all types of Tesla cars and put it into pie chart
Tesla_Models <- EV_Sales %>%
filter(MAKE == "TESLA") %>%
group_by(MODEL) %>%
summarize(total = sum(`Number of Vehicles`)) %>%
arrange(desc(total)) %>%
mutate(total_model = sum(total)) %>%
mutate(perc = round(total/total_model, 2))
plot_ly(data = Tesla_Models, labels = ~MODEL, values = ~perc, type = "pie", sort = FALSE)
Overall, Model 3 contributes more than 50% of the Tesla sales. Roadster contributed to the 0% of the Tesla market due to supply chain shortages, so it is not available for the market to purchase. Model Y only contributes to 7% of the market because Model Y is the new model that was released in 2019; however the CEO of Tesla expected Model Y to be popular as Model X soon.
Analyze where all car companies stand before the release of Tesla Model 3
#Filter data from 2012 and 2017
EV_Make_Year_before_2017 <- EV_Sales %>%
group_by(MAKE, `Data Year`) %>%
summarize(total = sum(`Number of Vehicles`)) %>%
arrange(desc(total)) %>%
filter(`Data Year` <= 2017 & `Data Year` >= 2012)
EV_Cars_Sales_Before_2017 <- EV_Make_Year_before_2017 %>%
group_by(`MAKE`) %>%
summarize(total_sales = sum(total)) %>%
arrange(desc(total_sales))
EV_Cars_Sales_Before_2017 <- head(EV_Cars_Sales_Before_2017, 10)
ggplot(data = EV_Cars_Sales_Before_2017, aes(x = reorder(MAKE, -total_sales), y = total_sales, fill = MAKE)) +
geom_col() +
labs(x = "Model",
y = "Total Sales of Car",
title = "Total Numbers of Electric Cars Sold before 2018") + scale_y_continuous(labels = comma) +
theme(axis.text.x = element_text(angle = 45), legend.position = "none")
Compared to the previous bar graph, Tesla is the second place between 2012 and 2017. Chevrolet seems the first place between 2012 and 2017. Tesla Model 3 is a game changer to Tesla which brought Tesal to become the number 1 company that sold the most electric cars.
We choose top 6 brands to compare total sales from each year: Tesla, Toyota, Chevrolet, BMW, NISSAN
EV_Sales_Year_Model <- EV_Sales %>%
group_by(`Data Year`, MAKE) %>%
summarize(total_sales = sum(`Number of Vehicles`)) %>%
filter(MAKE %in% (c("TESLA", "Toyota", "CHEVROLET", "BMW", "NISSAN", "FORD"))) %>%
filter(`Data Year` >= 2015)
ggplot(data = EV_Sales_Year_Model, aes(x = MAKE, y = total_sales, fill = MAKE)) +
geom_col() + facet_wrap(~`Data Year`, nrow=1) +
theme(axis.text.x = element_text(angle = 90), legend.position = "none") +
labs(x = "Model", y = "Total Sales of Cars")
Except Tesla, sales for each brand is not increasing but decreasing, but only Tesla increases, which means that Tesla really dominated the market from Model 3. Additionally, Tesla is the company that made full electric cars, while other five companies built hybrid cars as well. That only means people are purchasing more full electric cars.
Find numbers of EV charging station at each county, and how it could influence electric car sales
#Total EV Chargers in each county
EV_Charger <- EV_Chargers
EV_Charger <- EV_Chargers %>%
select(c(County, Total)) %>%
arrange(desc(Total)) %>%
filter(County != 'Total')
head(EV_Charger, 10)
## # A tibble: 10 × 2
## County Total
## <chr> <dbl>
## 1 Los Angeles 20193
## 2 Santa Clara 16202
## 3 San Diego 7199
## 4 Orange 5477
## 5 San Mateo 4459
## 6 Alameda 3560
## 7 Sacramento 1852
## 8 San Francisco 1671
## 9 Contra Costa 1461
## 10 Riverside 1441
Analysis: Los Angeles, San Clara, and San Diego are the top 3. Only 4 counties have more than 5000 EV charging stations. LA has the most EV charging stations.
Draw a map visualization to demonstrate which county has the most EV charging station
states <- map_data("state")
ca_df <- subset(states, region == "california")
counties <- map_data("county")
ca_county <- subset(counties, region == "california")
ca_base <- ggplot(data = ca_df, mapping = aes(x = long, y = lat, group = group)) +
coord_fixed(1.3) +
geom_polygon(color = "black", fill = "gray")
EV_Charger$County <- tolower(EV_Charger$County)
EV_Chargers_County <- EV_Charger %>%
inner_join(ca_county, by = c("County" = "subregion"))
ditch_the_axes <- theme(
axis.text = element_blank(),
axis.line = element_blank(),
axis.ticks = element_blank(),
panel.border = element_blank(),
panel.grid = element_blank(),
axis.title = element_blank()
)
Map_EV_Charging_Station <- ca_base +
geom_polygon(data = EV_Chargers_County, aes(fill = Total), color = "black") +
geom_polygon(color = "black", fill = NA) +
theme_bw() +
scale_fill_gradient(low = "light blue", high = "red") +
ditch_the_axes
Map_EV_Charging_Station
Map Car Sales
#Total Car Sales Per County
EV_Sales_County <- EV_Sales %>%
group_by(County) %>%
summarize(total_sales = sum(`Number of Vehicles`)) %>%
arrange(desc(total_sales))
EV_Sales_County_1 <- EV_Sales_County
EV_Sales_County_1$County <- tolower(EV_Sales_County_1$County)
EV_Sales_County_1 <- EV_Sales_County_1 %>%
inner_join(ca_county, by = c("County" = "subregion"))
Map_EV_Sales_County <- ca_base +
geom_polygon(data = EV_Sales_County_1, aes(fill = total_sales), color = "black") +
geom_polygon(color = "black", fill = NA) +
theme_bw() +
scale_fill_gradient(low = "light blue", high = "red") +
ditch_the_axes
Map_EV_Sales_County
Step: Analyze the relationship between EV charging station and number of electric cars sold
EV_charger_car_sales <- EV_Chargers %>%
inner_join(EV_Sales_County, by = "County")
#The relationship between EV_Sales_County and EV_Chargers. Compare the proportion of charging_station vs prop_total.using log
ggplot(EV_charger_car_sales, aes(x = log(Total), y = log(total_sales))) +
geom_point() + labs(x = "Number of Charging Stations", y = "Total Sales of Cars",
title = "Numbers of Charging Stations vs Total Sales of Cars")
Analysis: There is a positive correlation between EV charging station and number of electric cars sold. We also recommend car companies to target counties that have high EV charging stations and high total car sales, such as Los Angeles, Santa Clara, Orange, etc. If there are more EV charging stations, car owners can easily find one to charge their cars.
Prediction: how many cars can be sold in the next five year. Provide a recommendation to car companies about where to target potential customers
EV_Sales_Year_desc <- EV_Sales_Year %>%
arrange(`Data Year`) %>%
filter(`Data Year` >= 2011)
lm.year <- lm(`total_sales` ~ `Data Year`, data = EV_Sales_Year_desc)
future_sales <- data.frame(predict(lm.year))
#next 10 years
Year <- seq(2021,2029)
future_sales <- data.frame(future_sales)
#added total_sales from 2020
future_sales <- data.frame(Year, head(future_sales,9) + 144099)
#change the column names
colnames(future_sales) <- c("Data.Year", "total_sales")
EV_Sales_Year <- data.frame(EV_Sales_Year)
#combine the past and the predicted data
EV_Sales_Future_Current <- rbind(EV_Sales_Year, future_sales) %>%
arrange(desc(`Data.Year`))
head(future_sales, 6)
## Data.Year total_sales
## 1 2021 146912.9
## 2 2022 163938.9
## 3 2023 180964.8
## 4 2024 197990.7
## 5 2025 215016.6
## 6 2026 232042.6
EV_Sales_Future_Current <- EV_Sales_Future_Current %>%
filter(`Data.Year` <= 2025)
EV_Sales_Future_Current %>% ggplot(aes(x = `Data.Year`, y = total_sales)) +
geom_line(colour = "blue") +
labs(x = "Year",
y = "Total Sales of Cars",
title = "Total Sales of Cars Per Year")
We used linear regression model to predict electric car sales in the next five years. Our independent variable is year, and dependent variable is sales. However, there can be some policies that can affect the popularity of electric cars, or the shortages of materials to build the cars. Thus, we do not want to predict for the next 10 years or so.
Analyze car sales by zip code
#Filter by Car Zip. Find out why
EV_Sales_Zip_Top_20 <- EV_Sales_Zip %>%
group_by(ZIP) %>%
summarize(total = sum(`Number of Vehicles`)) %>%
arrange(desc(total)) %>%
head(20)
EV_Sales_Zip_Top_20
## # A tibble: 20 × 2
## ZIP total
## <dbl> <dbl>
## 1 94539 7200
## 2 95014 6326
## 3 94568 5402
## 4 92618 5365
## 5 95120 5236
## 6 95035 4892
## 7 95070 4810
## 8 94087 4571
## 9 92620 4554
## 10 94536 4441
## 11 92130 4356
## 12 94582 4140
## 13 90266 4102
## 14 94025 3995
## 15 90049 3969
## 16 95124 3968
## 17 95125 3923
## 18 94555 3609
## 19 94538 3582
## 20 95051 3546
8 zips are from Santa Clara County, and 2 from Irvine (Orange County). We believe there is one or two zip codes from LA because LA is more diverse county. There are a few ways that car companies can target their customers by using our analysis. They can target their customers from zip codes that have high electric car sales, or target zip codes that have less electric cars but living in affluent communities, which requires more analysis.
To further expand this analysis, we can analyze what kinds of people purchase electric cars since not everyone can afford to purchase electric cars. Analyzing car sales by counties might be too generic because each city is different from other cities. Therefore, we can take consideration of people’s income in each city for car companies in order to target their consumers.